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1. INTRODUCTION 

One of the most many feared diseases that prevalent in today’s lifestyle is diabetes. Diabetes is a 
disease where blood glucose or blood level is too high [1]. It is caused by glucose comes from the food that 
had been eaten. This disease has become a problem not only to the people in developed countries, but also to 
the people in developing and underdeveloped countries. It is a problem for all people across the globe, 
regardless status of development of a country [2]. There are several factors that normally linked to the 
development of diabetes. High blood pressure is one of the much talked factors. Adhikari et al, [3] pointed 
out that high blood pressure in the body can be the factor that cause diabetes in a person. In addition, there 
should be a difference between a diabetic person and a normal person in terms of their chronological age. 
People with age of more than forty years old are some of the candidates to be diagnosed with diabetes [4]. 
Obesity is another factor that can be associated with diabetes. The person with the problem of obesity, which 
is overweight can be diagnosed with diabetes. At micro level inspection, low serum creatinine is a factor of 
type-2 diabetes in Caucasian morbidly obese patients that is independent of age, gender, family history of 
diabetes, anthropometric measures, hypertension, and current smoking. Rai and Jeganathan [5-7] indicate that 
the person that have a higher total amount of serum cholesterol is a person that have a diabetic condition 
whether with or without hypertension problem. There are also other factors that can lead to diabetes, such as 
genetic inheritance or family history [8]. Despite a long list of factors that associated with diabetes, there has 
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been no solid agreement on what factors are more important than the other. Until now, many risk factors have 
been suggested, but the identification of a definitive authentic factor is very much inconclusive. 

Many research have been conducted to investigate the relationships between factors and risk of 
developing diabetes. Yukako et al., [9] for example, examine the association between lifestyle and risk for 
diabetes among Japanese using Cox proportional hazards regression models. These data indicate that the 
factor of healthy behaviors prevents the incidence of diabetes. In this study, a multivariate prediction analysis 
of regression model is employed where large numbers of respondents involved. Bener et al., [10] examine the 
relationship between blood lead levels, blood pressure and diabetes as well as other selected social and 
biochemical factors, among workers in the United Arab Emirates. This comparative study used descriptive 
statistics median and geometric mean to observe the differences between two groups of respondents. 
The study supports the hypothesis of a positive association between lead exposure, high blood pressure and 
risk of diabetes and heart disease. In another related study, Fukuoka et al., [11] explore the perceived risk for 
diabetes and heart attack and associated health status of Caucasian, Filipino, Korean, and Latino Americans 
without diabetes. A cross-sectional survey approach was conducted among urban adults using descriptive 
statistics. They found out that older age, physical inactivity, smoking, and low HDL levels were not 
associated with risk of diabetes. Most of these types of research have tended to focus on statistical 
approaches where number of respondents were normally large and distributions of data were assumed to be 
normal. However, in many cases, these risk factors sometimes come with incomplete and vague information. 
Physical activity, for example, is difficult to express in exact measurement due to incomplete information and 
other intangible matters. In this sense, it is more appropriate to introduce qualitative and intelligent 
evaluation. Differently from previous research where statistical approaches were mainly applied, this paper 
aims to identify the key factors that contribute to the development of diabetes using fuzzy inference system 
(FIS). FIS is one example of the intelligent evaluation systems where logic-based rules are the main engine of 
the system. Within the architecture of FIS, factors that normally associated with the development of diabetes 
are defined as input variables while level of risk of diabetes is defined as output variable. 

Various applications of the systems to medical diagnosis have been conducted. These applications 
can be very helpful to achieve classification task, offline process simulation and diagnosis, online decision 
support tools and process tools. Ibrahim et al., [12] for example, built a classification model using fuzzy logic 
classifiers. Partitioning of membership functions in a fuzzy logic inference system has been proposed. 
A clustering method partitions based on similarity are defined into equal clusters. Lai et al., [13] proposed a 
system based on FIS to measure physiological parameters continuously to provide hypoglycaemia detection 
for Type 1 diabetes mellitus patients. The heart rate, corrected interval of the electrocardiogram signal were 
among the input of the system that used to detect the hypoglycaemic episodes. An intelligent optimiser is 
designed to optimise the FIS parameters that govern the membership functions and the fuzzy rules. Singla 
[14] develops a performance comparison between Mamdani-type and Sugeno-type fuzzy models of FIS for 
diagnosis of diabetes. The FIS that optimized with Genetic Algorithm was used by Moallem, et al., [15] for 
face detection. Khanale & Ambilwade[16] presented a FIS that diagnose the thyroid diseases. De Paula 
Castanho [17] proposes a fuzzy system to predict pathological stage of prostate cancer. Banerjee et al., [18] 
apply the method based on fuzzy rules to diagnose patients with oral precancers where fuzzy rules through 
If-Then were utilised. The vast applications of FIS to medical diagnosis undeniably add strong evidences on 
the significant roles of FIS in diagnosing diseases. This study extends such advantages of inferences based 
system to a case of diagnosing diabetic patients. 


2. A BRIEF OF FUZZY INFERENCE SYSTEM 

Fuzzy inference system (FIS) is the process of formulating the mapping from a given input to an 
output using fuzzy logic operators and fuzzy rules [19]. The mapping, then provides a basis from which 
decisions can be made, or pattern can be identified. FIS is one of the most famous applications of fuzzy logic 
and fuzzy set theory [20]. In fuzzy set theory, a variable that has a value is called linguistic variable [21]. 
The system mainly characterised by membership functions of input and output linguistic variables, fuzzy 
logic operators ‘or’ and ‘and’, and if-then rules. The optimality of the output variables depends on the types 
of fuzzy sets used in defining input variables and also the configuration of fuzzy rules. The strength of FIS is 
based on their twofold identity of input and output variables which they are able to handle linguistic 
concepts. Based on this strength, FIS have become universal approximators that able to perform non-linear 
mappings between inputs and outputs. These two strengths of FIS have been used to design two types of FIS 
which is the Mamdani-type and Sugeno-type. The main components of the system are fuzzification interface, 
inference engine and defuzzification. 

The basic architecture of FIS that comprises three components and rules can be seen in Figure 1. 
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Figure 1. Basic architecture of a fuzzy inference system 


The two inferences Mamdani-type and Sugeno-type are basically run in accordance with the 
architechiture in Figure | where fuzzification and deffuzification are the two main processes. The third 
process is inference process. In Mamdani inference process, the output is defined as membership function 
where as, in Sugeno inference process, the output is explained by a single polynomial with respect to input 
variables. The Mamdani inference has a common structure with different rule bases for input and output. 

The main feature of Mamdani inference is membership functions for input and input data. FIS can 
be envisioned as a processing tool based on the knowledge that activate to the system. The knowledge base 
provides membership functions and fuzzy rules needed for the process. There are five most common shapes 
of membership functions that can be utilised in the system: Triangular, Trapezoidal, Gaussian, Generalised- 
Bell, and Sigmoidal [22, 23]. The membership values of these functions can take a value from a closed 
interval [0,1] regardless of shapes of membership functions. For inference rules, the most common way 
writing fuzzy rule is given as follows. 

IF premise (antecedent), THEN conclusion (consequent). 

This form of rule is commonly referred to as IF-THEN rule-based system. In the case of input more than one, 
then multiple conjunctive antecedents ‘AND’ or ‘OR’ can be represented. For example, if we have three 
inputs and one output, then the rule can be written as, 

IF( xl is X1 AND x2 is X2 AND x3 is X3 ), THEN yl is Y1, where x1, x2, and x3 are input 
variables, X1, X2, and X3 are its respective membership functions. The output variable is represented by y1 
and Y1 is the membership function of the output variable. 

During the processing stage, numerical crisp variables are the input of the system. These variables 
are passed through a fuzzification stage where they are transformed to linguistic variables, which later 
become the fuzzy input for the inference engine. The fuzzy input is transformed by the rules of the inference 
engine to fuzzy output. These linguistic results are then changed by defuzzification stage into numerical 
values that become the output of the system [24]. The defuzzification technique that commonly used is center 
of mass where one crisp number can be obtained. It is computed using the following equation, 


= 2442; Uc Z ; ) 
Èi uel zj) 


; ee TE Zj 
where z is centre of mass and “c is membership in class c at value */ . 


The centre of mass z, is a crisp value where interpretation of the output is straightforward. 
The whole research framework that explain the research protocol and method are presented 
in the next section. 


3. RESEARCH FRAMEWORK 

The research framework is presented to aid in conceptualizing how the input variables interact with 
the risks of diabetic patients. Data were retrieved from the diabetes clinical audit report of fifty patients with 
diabetes at a government funded hospital in the state of Selangor Malaysia after obtaining permission from 
the hospital authority. The clinical audit report contains information about factors that affects the 
development of diabetes. Body mass index (BMI), age, blood pressure, Creatinine and serum cholesterol are 
among the information that can be captured from the report. 
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The system described in Section 2 is used to translate the linguistic terms of the input variables 
using the IF-THEN rules and aggregate the output into crisp values. The system is expected to be able to 
identify the influential factors contributing to the development of diabetes. Summarily, the research 
framework can be depicted in Figure 2. 


Clinical audit report of diabetic 


Crisp Data pee 


Input variables: BMI (X1), age (X2), 
blood pressure (X3), Creatinine (X4), 
serum cholesterol (X5). 


I 


Processing: Fuzzy Logic Operations | 


Fuzzy Inference 
System 


and If-then rules 


Output variable: Risk of developing 
diabetes (Y1). 


Figure 2. Framework of the research 


The following section presents the results obtained from the computation of FIS over the factors that 
affect the diabetic patients. The full computation of the input data is implemented in the next section. 


4. IMPLEMENTATION AND RESULTS 

Fuzzy inference system takes input, applies fuzzy rules, and produces explicit output. FIS of 
Mamdani inference method is suitable for identifying factors of diabetes as both the input and the output of 
FIS are represented by the values of linguistic variables. Detailed computations of the case can be 
summarised in the following steps. 
Step 1: Defining Input and Output Variables 

The five input variables are BMI, age, blood pressure, Creatinine, and serum cholesterol, which are 
translated into descriptive words or linguistic scales. The five input variables are then connected to the 
system which is Mamdani type and linked to the output, which is the level of risk of developing diabetes. 
Levels of the risks are measured based on the five risk factors of developing diabetes. Based on the defined 
system functional and operational characteristics, input crisp data from this experiment are needed to fuzzify. 
Step 2: Defining Fuzzy Sets for System Variables 

System variables need to fuzzify in order to obtain fuzzy membership values. The system recognizes 
the input and output variables and defines its memberships. The Gaussian membership function is utilised to 
define the five input variables where as the triangular membership function is utilised to defined the level of 
risk or output variables. Memberships for the risks of diabetes, for example, are defined in three linguistic 
terms, ‘High’, ‘Medium’ and ‘Low’. Ranges of data are given in open interval and closed interval for input 
variables and output variable respectively. The open and closed intervals are defined in accordance with the 
type of membership functions used. Descriptions of variables, linguistic scales and ranges of crisp data in 
membership functions are summarised in Table 1. 

The crisp value of each variable is inserted into FIS editor. For example, Figure 3 shows the range 
of crisp values for the variable BMI that defined in accordance with Gaussian membership function. 
This function is also known as normal distribution function. 
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Table 1. Descriptions of variables 
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Variables Linguistic terms Crisp values of membership functions given in interval 
BMI (kg/m?) X; Low (17.8, 29.4) 

Medium (18, 41) 

High (29.4, 42) 
Age (year) X2 Adult (32, 49) 

Old (40, 60) 

Very Old (50,66) 
Systolic Blood Pressure (mmHg) X; Low (90, 120) 

Normal (100 145) 

High (140, 167) 
Cratinine (umol/l) X4 Low (57, 93.5) 

Normal (60, 120) 

High (110, 130) 
Serum cholesterol (mmol/L) Xs Low (1.036 2.5) 

Normal (1.058 6.55) 

High (5.5, 9.6) 
Risk of diabetes Y, Low [0.1, 0.39] 

Normal [0.40,0.60] 

High [0.61, 0.99] 

plot points: 
Membership function plots i 
low medum high 
| 
input variable "BUT 


Figure 3. Input membership function for BMI 


The input variable range is set to the minimum and maximum value of the data obtained from the 
clinical audit report. The linguistic values are {low, medium, high}. 
Similarly, the output variable is defined. Unlike the input variables, the output variable utilised 
triangular membership function. The risks of developing obesity are assumed to be linear function that can be 
characterised by two right angle triangles and an equidistant triangle. Triangular fuzzy sets provide a 
satisfaction of a zero-error reconstruction criterion for the output interface [25]. The crisp values for output 


variable is shown in Figure 4. 
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Figure 4. Output membership function 
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The crisp value of the output variable has been set to an interval of linguistic values {low, medium, high}. 
Step 3: Creating rules 

The next step is creating IF-THEN rules to describe the behavior of the system. The rules are 
designed with the purpose to describe the importance of the factors over the possibility of risks. For example, 
the rule created specifically for patient 1 is given as follows. 


If BMI is medium and age is adult, and blood pressure is normal and creatinine is low, and serum 
cholesterol is low, then risk diabetes is normal. 


According to grid partitioning, there are 3°= 243 possible rules could be generated since there are 
three linguistic variables and five input variables. However, based on the knowledge of a medical expert in 
diabetes, there are fifty rules created to describe the relationship between input variables and risk of diabetes. 
Figure 5 shows a sample of the rules. 


1. If (BNI is medium} and (AGE is adult; and (GLOODPRESSURE is normal} and (CREATININE is low} and (SERUMCHOLESTROL is low} then (RISKOFDIABETES is normal; (1) 
2. If (BMI is medium) and (AGE is adukt) and (BLOODPRESSURE is high) and (CREATINNE is high) and (SERUMCHOLESTROL is low) then (RISKOFDIABETES is normal) (1) 
3. If (BMI is low) and (AGE is adukt) and (BLOODPRESSURE is high) and (CREATINNE is high) and (SERUMCHOLESTROL is normal) then (RISKOFDIABETES is normal) (1) 
4. If (BMI is low) and (AGE is adult) and (BLOODPRESSURE is high) and (CREATININE is high) and (SERUMCHOLESTROL is normal) then (RISKOFDIABETES is normal) (1) 
5. If (BMI is low) and (AGE is adult) and (BLOODPRESSURE is normal) and (CREATININE is low) and (SERUMCHOLESTROL is low) then (RISKOFDIABETES is high) (1) 
6. If (BMI is medium) and (AGE is adult) and (BLOODPRESSURE is high) and (CREATININE is low) and (SERUMCHOLESTROL is low) then (RISKOFDIAGETES is high) (1) 
7. If (BMI is medium) and (AGE is adult) and (BLOODPRESSURE is high) and (CREATINME is normal) and (SERUMCHOLESTROL is low) then (RISKOFDIAGETES is high) (1) 
8. If (BMI is high) and (AGE is adult) and (BLOODPRESSURE is high) and (CREATININE is normal) and (SERUMCHOLESTROL is normal) then (RISKOFDIABETES is normal) (1) 
9. If (BMI is medium) and (AGE is adult) and (BLOODPRESSURE is normal) and (CREATININE is normal) and (SERUMCHOLESTROL is normal) then (RISXOFDIABETES is normal) (1) 
. If (BMI is low) and (AGE is adukt) and (BLOODPRESSURE is low) and (CREATININE is normal) and (SERUMCHOLESTROL is low) then (RISKOFDIAGETES is normal) (1) 
f (BMI is low) and (AGE is aduk) and (BLOODPRESSURE is low) and (CREATININE is low) and (SERUMCHOLESTROL is normal) then (RISKOFDIABETES is high) (1) 
f (BMI is medium) and (AGE is adut) and (BLOODPRESSURE is normal) and (CREATININE is normal) and (SERUMCHOLESTROL is normal) then (RISKOFDIABETES is normal) (1) 
. If (BMI is medium) and (AGE is aduk) and (BLOOOPRESSURE is normal) and (CREATININE is high) and (SERUMCHOLESTROL is normal) then (RISKOFOIABETES is normal) (1) 
f (BMI is medium) and (AGE is adut) and (BLOOOPRESSURE is normal) and (CREATININE is high) and (SERUMCHOLESTROL is normal) then (RISKOFOIABETES is low) (1) 
f (BMI is high) and (AGE is adut) and (BLOODPRESSURE is normal) and (CREATININE is normal) and (SERUMCHOLESTROL is normal) then (RISKOFDIABETES is normal) (1) 
. If (BMI is medium) and (AGE is adut) and (BLOOOPRESSURE is normal) and (CREATININE is low) and (SERUMCHOLESTROL is normal) then (RISKOFOIABETES is high) (1) 
«If (BMI is medium) and (AGE is adut) and (BLOOOPRESSURE is normal) and (CREATININE is normal) and (SERUMCHOLESTROL is normal) then (RISKOFDIABETES is normal) (1) 
. It (BMI is high) and (AGE is adut) and (BLOOOPRESSURE is normal) and (CREATININE is low) and (SERUMCHOLESTROL is normal) then (RISKOFOIABETES is normal) (1) 
If (BMI is low) and (AGE is adukt) and (BLOODPRESSURE is normal) and (CREATININE is low) and (SERUMCHOLESTROL is high) then (RISKOFOIABETES is high) (1) 
i a is madui) and AARE is san) and M EESE is forts) and Ase aie is pw). and Lome ede is nigh) then (REROTOIADETES is nign) (1) 


Figure 5. Fuzzy rules for input and output 


The inference rules set the premises to create output. The output, then need to defuzzify in order to obtain 
crisp value. 


Step 4: Defuzzification 

Defuzification step is needed to convert all input data into three linguistic terms that can be used to 
observe the risk of diabetes. The defuzzification process transforms the fuzzy set into a crisp value that is 
meaningful to the end-user. For example, if a patient’s BMI is 29.4, age is 49, blood pressure is 134, 
creatinine is 93.5, serum cholesterol is 6.55, then the defuzzification result shows the output is 0.5. Thus, 
based on the defined output, the level of risk is ‘normal’. Defuzzification for the rest of patitents was 
implemented with the similar fashion. A sample of the defuzzification process is shown in Figure 6. 

Risks of diabetes in three levels are finally obtained after completing the defuzification process. The 
risks of Low, Normal and High are described in frequency analyses for each input variable (factors). This 
explains the level of risk based on the number of rules in which dominant rules for factors could be 
identified. For example, 58 percent of BMI contributed to Medium risk of diabetes. Table 2 shows the 
number of rules and percentages that can be related to each level of risk. 


Table 2. Number of rules and percentage based on level of risk and factors 
Linguistic of risk BMI (%) Age (%) Blood Pressure (%) Creatinine (%) Serum Cholesterol (%) 


Low 17 64) 0 (0) 6 (12) 24 (48) 20 (40) 
Medium 29 (58) 3 (6) 16 (32) 20 (40) 3(6) 
High 4 (8) 47 (94)  28(56) 6 (12) 27 (54) 
Total 50(100) 50100) —_50(100) 50 (100) 50 (100) 
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Figure 6. Defuzification process from input to output 


It can be seen that the risk of diabetes is Low if the patient has higher in creatinine. The patient is at 
High risk if serum cholesterol, blood pressure and age are also high. This finding concludes the three highest 
contributing factors toward diabetes are age, blood pressure and serum cholesterol. 

The next analysis attempts to add more on the association between risk of diabetes and its related 
factors. From the defuzification viewer, relationships between input and output can be visualised through a 
graphical modelling. The relationship shows a three-dimensional curve that represents the mapping of the 
membership function. It allows us to see the output surface for the two inputs [16]. Figure 7 shows the 
surface viewer diagram for the combination of the input of age and blood pressure. 


BLOODPRESSURE 


Figure 7. Relationship between age and blood pressure against risk of of diabetes 


The surface explains that the risk is low when the blood pressure is in the range [110, 150] despite 
increases in age. It is indicated by the dark blue surface area. 
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The relationship between blood pressure and serum cholesterol against the risk of diabetes can be 
viewed in Figure 8. 


RISKOFDIABETES 


SERUMCHOLETEROL 


BLOODPRESSURE 


Figure 8. Relationship between serum cholesterol and blood pressure against risk of diabetes 


The surface shows uneven curves which indicate that the risk of diabetes is higher when blood 
pressure is also getting higher despite consistency in serum cholesterol readings. In other words, the three 
dimensional surface also indicate low risk of diabetes if the blood pressure readings are in the range [110, 
130] with low serum cholesterol. Three dimensional surface viewer diagram of the relationships of other risk 
factors can also be displayed with the similar way. However, these relationships diagrams are limited to the 
interactions between the two risk factors and level of risks. 


5. CONCLUSION 

The fuzzy inference system is a model with well-defined input and output along with a processing 
module that carries out all the computation at the linguistic level. It also can model the qualitative aspects of 
human knowledge and reasoning process without employing precise quantitative analyses. This paper has 
shown the superiority of fuzzy inference system for elucidating the association between risk level of diabetes 
and the risk factors using the rules IF-THEN in the architecture input variables, rules inferences engine and 
output variable. Data available from fifty clinical reports of patients with diabetes were used to identify risk 
factors of diabetes. Linguistic terms of five input variables and three linguistic terms of single output of risk 
were defined in the architechture. The carefully defined fifty rules were employed to connect the input and 
output. This study contributes two main findings out of the use of FIS. The system identified three most risk 
factors are age, blood pressure and serum cholesterol. The system also can suggest the level of risk based on 
interactions of two factors. These data indicate that risk of getting diabetes is higher when age, blood 
pressure and serum cholesterol are also higher. The system permits fuzzy rules to become a useful tool for 
identifying risk factors of diabetes of different groups of ages, BMI, blood pressure, creatinine and serum 
cholesterol. 
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